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COMBINING GEOMETRY AND COMBINATORICS: 
A UNIFIED APPROACH TO SPARSE SIGNAL RECOVERY 

R. BERINDE, A. C. GILBERT, P. INDYK, H. KARLOFF, AND M. J. STRAUSS 

Abstract. There are two main algorithmic approaches to sparse signal recovery: geometric and 
combinatorial. The geometric approach starts with a geometric constraint on the measurement 

p^ ] matrix <1> and then uses linear programming to decode information about x from Qx. The com- 

pvj . binatorial approach constructs 3> and a combinatorial decoding algorithm to match. We present a 

unified approach to these two classes of sparse signal recovery algorithms. 

The unifying elements are the adjacency matrices of high-quality unbalanced expanders. We 

^A . generalize the notion of Restricted Isometry Property (RIP) , crucial to compressed sensing results for 

signal recovery, from the Euclidean norm to the £ p norm for p w 1, and then show that unbalanced 

ON . expanders are essentially equivalent to RIP-p matrices. 

From known deterministic constructions for such matrices, we obtain new deterministic mea- 
surement matrix constructions and algorithms for signal recovery which, compared to previous 
deterministic algorithms, are superior in either the number of measurements or in noise tolerance. 



CZ2 . 

O , 1. Introduction 

With the rise in high-speed data transmission and the exponential increase in data storage, it 
is imperative that we develop effective data compression techniques, techniques which accomodate 
both the volume and speed of data streams. A new approach to compressing n-dimensional vec- 
^sO ■ tors (or signals) begins with linear observations or measurements. For a signal x, its compressed 

representation is equal to *&x, where $ is a carefully chosen m X n matrix, m -C n, often chosen 
at random from some distribution. We call the vector 3>x the measurement vector or a sketch of 
x. Although the dimension of <J>x is much smaller than that of x, it retains many of the essential 
OO ! properties of x. 

There are several reasons why linear compression or sketching is of interest. First, we can easily 
maintain a linear sketch <&x under linear updates to the signal x. For example, after incrementing 
the i-th coordinate Xi, we simply update the sketch as $>(x + e^) = *&x + <&ej. Similarly, we 
also easily obtain a sketch of a sum of two signals given the sketches for individual signals x 
and y, since <&(x + y) = *&x + <&y. Both properties are very useful in several computational areas, 
notably computing over data streams [AMS99, M ut03| Hnd07| . network measurement |EV03] . query 
optimization and answering in databases [AMS99J . 

Another scenario where linear compression is of key importance is compressed sensing [CRT06J 
IDon06 a] . a rapidly developing area in digital signal processing. In this setting, x is a physical 
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signal one wishes to sense (e.g., an image obtained from a digital camera) and the linearity of the 
observations stems from a physical observation process. Rather than first observing a signal in its 
entirety and then compressing it, it may be less costly to sense the compressed version directly 
via a physical process. A camera "senses" the vector by computing a dot product with a number 
of pre-specified measurement vectors. See |TLW + 06| |DDT + 08| for a prototype camera built using 
this framework. Other applications of linear sketching include database privacy |DMT07] . 

Although the sketch is considerably smaller than the original vector, we can still recover a large 
amount of information about x. See the surveys |Mut03l IInd07] on streaming and sublinear algo- 
rithms for a broad overview of the area. In this paper, we focus on retrieving a sparse approximation 
x* of x. A vector is called k-sparse if it has at most k non-zero elements in the canonical basis (or, 
more generally, k non-zero coefficients in some basis B). The goal of the sparse approximation is to 
find a vector x* such that the £ p approximation error \\x — x*|L is at most C > times the smallest 
possible £ q approximation error ||x — x'\\ q , where x' ranges over all fc-sparse vectors. Note that the 
error \\x — x'\\q is minimized when x' consists of the k largest (in magnitude) coefficients of x. 

There are many algorithms for recovering sparse approximations (or their variants) of signals from 
their sketches. The early work on this topic includes the algebraic approach of |Man92] (cf. JGGI + 02aj ). 
Most of the known algorithms, however, can be roughly classified as either combinatorial or geo- 
metric. 

Combinatorial approach. In the combinatorial approach, the measurement matrix $ is sparse 
and often binary. Typically, it is obtained from an adjacency matrix of a sparse bipartite random 
graph. The recovery algorithm proceeds by iteratively, identifying and eliminating "large" coeffi- 
cientaj of the vector x. The identification uses non-adaptive binary search techniques. Examples 
of combinatorial sketching and recovery algorithms include [GGI + 02bl IGCFG021 ICM041 IGKMS031 
IDWBn51 ISBB06bl ISBB06a} IGM06J IGSTV061 IGSTV07} llndUsl IXHuTj and others. 

The typical advantages of the combinatorial approach include fast recovery, often sub-linear in 
the signal length n if k <C n and fast and incremental (under coordinate updates) computation of the 
sketch vector <&x. In addition, it is possible to construct efficient (albeit suboptimal) measurement 
matrices explicitly, at least for simple type of signals. For example, it is known |Ind081 IXH07J how 
to explicitly construct matrices with k2" og gn > measurements, for signals x that are exactly 
A;-sparse. The main disadvantage of the approach is the suboptimal sketch length. 

Geometric approach. This approach was first proposed in the papers |CRT 06. Don06aj and 
has been extensively investigated since then (see |Gro06j for a bibliography). In this setting, the 
matrix $ is dense, with at least a constant fraction of non-zero entries. Typically, each row of 
the matrix is independently selected from a sub-exponential n-dimensional distribution, such as 
Gaussian or Bernoulli. The key property of the matrix $ which yields efficient recovery algorithms 
is the Restricted Isometry Property [CRT06] . which requires that for any ^-sparse vector x we have 
II & x II 2 = (1 i 5)[|x|[2- If a matrix <& satisfies this property, then the recovery process can be 
accomplished by finding a vector x* using the following linear program: 

min ||:e*||i subject to *&x* = $>x. (PI) 

The advantages of the geometric approach include a small number of measurements (0(k log(n/2k)) 
for Gaussian matrices and 0(klog ' ' n) for Fourier matrices) and resiliency to measurement er- 
rorsj. The main disadvantage is the running time of the recovery procedure, which involves solving 
a linear program with n variables and n + m constraints. The computation of the sketch $x can be 



In the non-sketching world, such methods algorithms are often called "weak greedy algorithms" , and have been 
studied thoroughly by Temlyakov |Tem02] 

o 

Historically, the geometric approach resulted also in the first deterministic or uniform recovery algorithms, where 
a fixed matrix <l? was guaranteed to work for all signals x. In contrast, the early combinatorial sketching algorithms 
only guaranteed 1 — 1/n probability of correctness for each signal x. However, the papers [GSTV06 , GSTV07] showed 
that combinatorial algorithms can achieve deterministic or uniform guarantees as well. 
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done efficiently for some matrices (e.g., Fourier); however, an efficient sketch update is not possible. 
In addition, the problem of finding an explicit construction of efficient matrices satisfying the RIP 
property is open [Tao07| ; the best known explicit construction [DeV07| yields 0(A: 2 ) measurements. 

Connections. There has been some recent progress in obtaining the advantages of both ap- 
proaches by decoupling the algorithmic and combinatorial aspects of the problem. Specifically, the 
papers |NV07l IDM081 INT08J show that one can use greedy methods for data compressed using 
dense matrices satisfying the RIP property. Similarly [GLR08] . using the results of |KT07j . show 
that sketches from (somewhat) sparse matrices can be recovered using linear programming. 

The best results (up to O(-) constants) obtained prior to this work are shown in Figure Qfll 
We ignore some aspects of the algorithms, such as explicitness or universality of the measurement 
matrices. Furthermore, we present only the algorithms that work for arbitrary vectors x, while many 
other results are known for the case where the vector x itself is exactly /c-sparse; e.g., see |TG05[ 
IDWB 05 . SBB06b, Don06a] IXH07| . The columns describe: citation; whether the recovery algorithms 
hold with high probability for All signals or for Each signal; sketch length; time to compute &x 
given x; time to update 3>x after incrementing one of the coordinates of x; timqj to recover an 
approximation of x given $x; approximation guarantee; and whether the algorithm is robust to 
noisy measurements. In the approximation error column, £ p < A£ q means that the algorithm 
returns a vector x* such that \\x— x*|L < Amm x i \\x — x'\\ q , where x' ranges over all /c-sparse vectors. 
In [CDD06J . the authors show that an approximation guarantee of the form "^2 < 7372^1" implies 
a "^1 < (1 + 0(C))£i" guarantee, and that it is impossible to achieve "£2 < CI2 deterministically 
(or for all signals simultaneously) unless the number of measurements is Q(n). The parameters 
C > 1, c > 2 and a > denote absolute constants, possibly different in each row. We assume that 
k < n/2. 

In addition, in Figure [2] we present very recent results discovered during the course of our 
research. Some of the running times of the algorithms depend on the "precision parameter" D, 
which is always bounded from the above by ||x||2 if the coordinates of x are integers. 

1.1. Our results. In this paper we give a sequence of results which indicate that the combinatorial 
and geometric approaches are, in a rigorous sense, different manifestations of a common underlying 
phenomenon. This enables us to achieve a unifying perspective on both approaches, as well as 
obtaining several new concrete algorithmic results. 

We consider matrices which are binary and sparse; i.e., they have only a small number d of ones 
in each column, and all the other entries are equal to zero. It has been shown recently |Cha08j that 
such matrices cannot satisfy the RIP property with parameters k and S, unless the number of rows 
is Q,(k 2 ). Our first result is that, nevertheless, such matrices satisfy a different form of the RIP 
property, that we call the RIP-p property, where the £2 norm is replaced by the £ p norm. Formally, 
the matrix <1? satisfies RIPp^.^ property if for any fc-sparse vector x we have ||$x|| p = (1 ± <5)||a;|L. 
In particular, we show that this property holds for 1 < p < 1 + 0(1)/ log n if the matrix $ is 
an adjacency matrix of a high-quality unbalanced expander graph. Thus we have a large class of 
natural such measurement matrices. We also exhibit an RIP-2 matrix that is not an RIP-1 matrix, 
so that, with [Cha08| . we conclude that these two conditions are incomparable — neither one is 
stronger than the other. 



Some of the papers, notably |CM04| . are focused on a somewhat different formulation of the problem. However, 
it is known that the guarantees presented in the table hold for those algorithms as well. See Lecture 4 in |Ind07| for 
a more detailed discussion. 

In the decoding time column LP=LP(n, m,T) denotes the time needed to solve a linear program defined by 
an m x n matrix 4? which supports matrix-vector multiplication in time T. Heuristic arguments indicate that 
LP(n, m, T) ~ y/nT if the interior-point method is employed. In addition, the paper |NV07] does not discuss the 
running time of the algorithm. Our bound is obtained by multiplying the number of algorithm iterations (i.e., k) by 
the number of entries in the matrix 3? (i.e., nk log c n). See |NT08| for an in-depth discussion of the running times of 
OMP-based procedures. 
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FIGURE 1 . Summary of the best prior results. 
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FIGURE 2. Recent work. 



Theorem 1. Consider any m x n matrix $ that is the adjacency matrix of an (k,e) -unbalanced 
expander G = (A,B,E), \A\ = n, \B\ = m, with left degree d, such that l/e,d are smaller than n. 
Then the scaled matrix $ /d l ' p satisfies the RIP Pj fc : 5 property, for 1 < p < 1 + 1/ log n and 5 = Ce 
for some absolute constant C > 1. 

The fact that the unbalanced expanders yield matrices with RIP-p property is not an accident. 
In particular, we show in Section [2] that any binary matrix $ in which each column has d onesj 
and which satisfies RIP-1 property with proper parameters, must be an adjacency matrix of a 
good unbalanced expander. That is, an RIP-p matrix and the adjacency matrix of an unbalanced 
expander are essentially equivalent. Therefore, RIP-1 provides an interesting "analytic" formulation 
of expansion for unbalanced graphs. Also, without significantly improved explicit constructions of 
unbalanced expanders with parameters that match the probabilistic bounds (a longstanding open 
problem), we do not expect significant improvements in the explicit constructions of RIP-1 matrices. 



In fact, the latter assumption can be removed without loss of generality. The reason is that, from the RIP-1 
property alone, it follows that each column must have roughly the same number of ones. The slight unbalance in the 
number of ones does not affect our results much; however, it does complicate the notation somewhat. As a result, we 
decided to keep this assumption throughout the paper. 
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Theorem 2. Consider any m x n binary matrix $ such that each column has exactly d ones. If 
for some scaling factor S > the matrix S& satisfies the RIPi s 5 property, then the matrix 3> is 
an adjacency matrix of an (s,e) -unbalanced expander, for 

;i- T i 7 )/(2-v2). 

In the next step in Section[3l we show that the RIP-1 property of a binary matrix (or, equivalently, 
the expansion property) alone suffices to guarantee that the linear program Pi recovers a good sparse 
approximation. In particular, we show the following 

Theorem 3. Let & be anmxn matrix of an unbalanced (2k, e)- expander. Let a(e) = (2e)/(l — 2e). 
Consider any two vectors x,x*, such that Qx = $>x*, and ||a;*||i < ||aj||i- Then 

\\x — x*||i < 2/(1 — 2a(e)) • \\x — Xk\\i- 

where Xk is the optimal k-term representation for x. 

We also provide a noise-resilient version of the theorem; see Section [3] for details. 

By combining Theorem [3] with the best known probabilistic constructions of expanders (Sec- 
tion [2j) we obtain a scheme for sparse approximation recovery with parameters as in Figure [TJ The 
scheme achieves the best known bounds for several parameters: the scheme is deterministic (one 
matrix works for all vectors x), the number of measurements is 0(klog(n/k)), the update time is 
0(\og(n/k)) and the encoding time is 0(nlog(n/k)). In particular, this provides the first known 
scheme which achieves the best known measurement and encoding time bounds simultaneously. In 
contrast, the Gaussian and Fourier matrices are known to achieve only one optimal bound at a 
time. The fast encoding time also speeds-up the LP decoding, given that the linear program is 
typically solved using the interior-point method, which repeatedly performs matrix-vector multipli- 
cations. In addition to theoretical guarantees, random sparse matrices offer an attractive empirical 
performance. We show in Section d] that the empirical behavior of binary sparse matrices with LP 
decoding is consistent with the analytic performance of Gaussian random matrices. 

In the final part of the paper, we show that adjacency matrices of unbalanced expanders can 
be augmented to facilitate sub-linear time combinatorial recovery. This fact has been implicit 
in the earlier work [GSTV071 IInd08] ; here we verify that indeed the expansion property is the 
sufficient condition guaranteeing correctness of those algorithms. As a result, we obtain an explicit 
construction of matrices with O(£;2( loglogn ) ) rows that are amenable to a sublinear decoding 
algorithm for all vectors (similar to that in [GSTV07] ) . Previous explicit constructions for sublinear 
time algorithms either had Q,(k 2 ) rows |GM06| or had 0(k2^°^°^° {1) ) rows |Ind08l IXH07] but 
were restricted to fc-sparse signals or their slight generalizations. An additional (and somewhat 
unexpected) consequence is that the algorithm of |Ind08j is simple, effectively mimicking the well- 
known "parallel bit-flip" algorithm for decoding low-density parity-check codes. 

Theorem 4. Let e > be a fixed constant, and p = 1 + 1/logn. Consider x G M n and a sparsity 
parameter k. There is a measurement matrix &, which we can construct explicitly or randomly, 
and an algorithm HHS(p) that, given measurements v = ^x of x, returns an approximation x of x 
with 0(k/e) nonzero entries. The approximation satisfies 

\\X 3/ Tj _^ 6rC \\CC wfa M. ■ 

where Xk is the optimal k-term representation for x. 

Figure [3] summarizes the connections among all of our results. We show the relationship between 
the combinatorial and geometric approaches to sparse signal recovery 
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Geometric Combinatorial 

RIP-2 < ■/- ^RIP-1 



Weak 
programming greedy 

Figure 3. The above diagram captures the relations between the combinatorial and 
geometric approaches and the two main classes of decoding algorithms. Connections 
established in prior work are shown with dashed lines. Our work connects both 
approaches, with the ultimate goal of obtaining the "best of both worlds." 

2. Unbalanced expanders and RIP matrices 

2.1. Unbalanced expanders. In this section we show that RIP-p matrices for p ~ 1 can be 
constructed using high-quality expanders. The formal definition of the latter is as follows. 

Definition 5. A (k, e) -unbalanced expander is a bipartite simple graph G = (A,B,E) with left 
degree d such that for any X C A with \X\ < k, the set of neighbors N(X) of X has size \N(X)\ > 
(l-e)d\X\. 

In constructing such graphs, our goal is to make \B\, d, and e as small as possible, while making 
k as close to \B\ as possible. 

The following well-known proposition can be shown using the probabilistic method. 

Proposition 6. For any n/2 > k > 1, e > 0, there exists a (k,e) -unbalanced expander with left 
degree d = 0(log(n/k)/e and right set size 0(kd/e) = 0{k\og{n / k) / e 2 ) . 

Proposition 7. For any n > k > 1 and e > 0, one can explicitly construct a (k,e) -unbalanced 
expander with left degree d = 2 ' s ^ lo s( n )/ e ))) } l e ft set size n and right set size m = kd/e : ( ' . 

Proof. The construction is given in |CRVW02] . Theorem 7.3. Note that the theorem refers to 
notion of lossless conductors, which is equivalent to unbalanced expanders, modulo representing 
all relevant parameters (set sizes, degree, etc.) in the log-scale. After an additional 0(nd)-time 
postprocessing, we can ensure that the graph is simple; i.e., it contains no duplicate edges. □ 

2.2. Construction of RIP matrices. 

Definition 8. An m X n matrix «& is said to satisfy RIPj, & g if, for any k-sparse vector x, we have 

1 1 x lip — 1 1 ^ x lip — K 1 * u ) ll x llp- 

Observe that the definitions of RIPi^ ,$ and RIP2,fc,<5 matrices are incomparable. In what follows 
below, we present sparse binary matrices with 0(klog(n/k)) rows that are RIPi^; it has been 
shown recently |Cha08| that sparse binary matrices cannot be RIP2,fc,<5 unless the number of rows 
is Q(k 2 ). In the other direction, consider an appropriately scaled random Gaussian matrix G of 
R ~ /clog(n) rows. Such a matrix is known to be RIP2,fc,5. To see that this matrix is not RIPi^^, 
consider the signal x consisting of all zeros except a single 1 and the signal y consisting of all zeros 
except k terms with coefficient 1/k. Then ||x||i = \\y\\i but ||Gx||i ~ vfc||Gy||i. 

Theorem [TJ Consider any m x n matrix $ that is the adjacency matrix of an (k,e) -unbalanced 
expander G = (A,B,E) with left degree d, such that l/e,d are smaller than n. Then the scaled 
matrix $/d}' p satisfies the RIP Pj fc : 5 property, for 1 < p < 1 + 1/logra and 5 = Ce for some absolute 
constant C > 1. 
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Proof. Let x G K n be a /c-sparse vector. Without loss of generality, we assume that the coordinates 
of x are ordered such that |xi| > ... > \x n \. 

The proof proceeds in two stages. In the first part, we show that the theorem holds for the case 
of p = 1. In the second part, we extend the theorem to the case where p is slightly larger than 1. 

The case of p = 1. We order the edges ej = (it,jt), t = 1 . . . dn of G in a lexicographic manner. It 
is helpful to imagine that the edges e±,e2 ■ ■ ■ of E are being added to the (initially empty) graph. 
An edge ej = (it,jt) causes a collision if there exists an earlier edge e s = (i s ,j s ),s < t, such that 
jt = js- We define E' to be the set of edges which do not cause collisions, and E" = E — E' . 

Lemma 9. We have 

y \x.i\ < ed HxIIj 

(i,j)£E" 

Proof. For each t = 1 . . . dn, we use an indicator variable rj G {0, 1}, such that rj = 1 iff ej G -B". 
Define a vector z G M such that z^ = |xj t |. Observe that 

(i,j)eE" e t =(h,jt)eE 

To upper bound the latter quantity, observe that the vectors satisfy the following constraints: 

• The vector z is non-negative. 

• The coordinates of z are monotonically non-increasing. 

• For each prefix set Pi = {l...di}, i < k, we have ||rip. ||i < edi - this follows from the 
expansion properties of the graph G. 

• ripj = 0, since the graph is simple. 

It is now immediate that for any r,z satisfying the above constraints, we have r ■ z < ||^||ie. 
Since ||z||i = d||x||i, the lemma follows. D 

Lemma [9] immediately implies that ll^x^ > d ||^Hi (1 — 2e). Since for any x we have H^x]^ < 
d \\x\\^ it follows that #/d satisfies the RlPi^^e property. 

The case of p < 1 + 1/logn. Let u = &x. We will show that if e is small enough, then the value 
of ||«||p is close to d||x||p. 

We start from a few useful technical claims. 

Claim 10. For any a, b G M, we have \(a + b)\ p > \a\ p — p\a\ p \b\. 

Claim 11. For any a£l and \b\ < \a\, we have \(a + b)\ p < \a\ p + p|2a| p ~ 1 |6|. 

For any j G B, define v!- = x, if (i,j) G E' , and v!- = otherwise. Also, define u 1 ' = Uj — u 1 -. 
Claim 12. We have £V \u"\ p = e(e)d||s||*. 

Proof. By Lemma[9]we know that Y2j \ u 'j\ = Y2(i j)eE' \ x i\ — 0- ~ e )^\\ x \\i- Therefore 
J2 \ u 'j\ P ^ £ \ u 'j\y ^ ( ed INIi) P ^ edPnS 1 - 1 /** \\xf p = @(e)d \\x\\ p p 



D 



Lemma 13. We have \\uE, > (1 - 9(e))d||x||£. 

Proof. By Claim [10] we know that 

n up X - ^ i / i "ip v. V"^ i / in V"^ i / ip— li //i 
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Define the set S = {j : e|u'-| < \u''\}. We have 

Ei / 1 p— 1 1 a \ V^ i / 1 v— 1 1 a \ i V^ i / 1 v— 1 1 ii i ^ /"i / \v— 1 V^ i "ip i V^ i / 1 



< G 



£Ki p + e £Kr 



<e(e)d||cc||£ + e| 



.'IIP 



Therefore 



i up ^ ii ' 

\ U \\r, > \\ U ,, 
I Up — II lip 



/||P e(l)de\\xf-e(e)\\u'\\ p 



|«'||J(i-e( c ))-e(i)(fc||x||5. 



It suffices to bound \\u'\\ p from below. 



,i\\p 



> d\\x\ 



> d\\x\ 



x 



E \ x i\ P ^ d \ 
(ij)e-E" 



9(e)d||x||?> d||x||P-e(e)d||x 




= (1-Q( e ))d\\x\\* 
where we used Lemma [9] in the third line. Altogether 

||u||j > (i - e(e))(i - e(e))d ||x||; - e(i)cfc \\x\\; = (i - e(e))d | 



Lemma 14. We foave ||ti||£ < (1 + 9(e))d ||x||*j. 

Proof. Define the set T = {j : \u"A < \u'A}. Decompose ||u||p into a sum 

jer j$t 

By Claim [11] we know that 

ti = E n p = E K + <r < E Kr +E*«r 1 Ki 

jer jGT jeT jGT 

which, as in the proof of Lemma [13l can be bounded from above by 

J>;r(i + e(e)) + e(e)d||*||£ 

jeT 
At the same time, using Claim [121 

The lemma follows. 

Combining Lemma [141 and Lemma [T3l yields the proof of the theorem. 



□ 



□ 
□ 



The above theorem shows that one can construct RIP-p matrices for p ~ 1 from good unbalanced 
expanders. In following, we show that this is not an accident: any binary matrix $ in which satisfies 
RIP-1 property with proper parameters, and with each column having exactly d ones, must be an 
adjacency matrix of a good unbalanced expander. This reason behind this is that if some set of 
vertices does not expand too well, then there are many collisions between the edges going out of 
that set. If the signs of the coordinates "following" those edges are different, the coordinates will 
cancel each other out, and thus the l\ norm of a vector will not be preserved. 
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The assumption that each column has exactly d ones is not crucial, since the RIP-1 property 
itself implies that the number of ones in each column can differ by at most factor of 1 + 5. All proofs 
in this paper are resilient to this slight unbalance. However, we decided to keep this assumption 
for the ease of notation. 

Theorem [2] Consider any m x n binary matrix $ such that each column has exactly d ones. If 
for some scaling factor S > the matrix S$> satisfies the KIPi jS s property, then the matrix $ is 
an adjacency matrix of an (s,e) -unbalanced expander, for 

-O-r^M 2 -^)' 

Note that, for small values of 5 > 0, we have (l - j^)/{2 - y/2) « 5/(2 - y/2). 

Proof. Let G = (A, B, E) be the graph with adjacency matrix <&. Assume that there exists X C A, 
\X\ = k' < k such that |iV(X)| < dk'(l — e). We will construct two ?i-dimensional vectors y, z such 
that \\y\\i = H^Hi = k' , but ||3>£||i / H^ylli < 1 — e(2 — V2), which is a contradiction. 

The vector y is simply the characteristic vector of the set X. Clearly, we have \\y\\i = k' and 
||3>vlli = dk. 

II & || I 

The vector z is defined via a random process. For i € X, define rj to be i.i.d. random variables 
uniformly distributed over {—1,1}. We define z% = r, if i G X, and zi = otherwise. Note that 

Iklli = IMIi = k '- 

Let C C N(X) be the "collision set", i.e., the set of all j € N(X) such that the number v,j of 
the edges from j to X is at least 2. Let \C\ = I. By the definition of the set C we have Y2 j u j — ^l- 
Moreover, from the assumption it follows that ^ • Uj > 2edk' . 

Let v = $>z. We split v into vc and !)c c - Clearly, ||fc* c |li = k'd— ^ ■ Uj. It suffices to show that 
H^clli is significantly smaller than ^2jUj for some z. 

Claim 15. The expected value of \\vcW2 * s equal to ^2~v,j. 

Proof. For each j E C, the coordinate Vj is a sum of Uj independent random variables uniformly 
distributed over {—1, 1}. The claim follows by elementary analysis. □ 



belli < V^ll u c|l2 — — 7n^- Therefore 



E,- u 3 



By Claim [T5l we know that there exists z such that ||i>c|l2 — a/Si m j — — rJ~- This implies that 

uc|li + ||uc<=|li < -^- + dk ' ~^Z u j = dk' -(1-1/ V2)^2uj 



hlli < 



v/2 



./ 



< d/c' - (1 - l/\/2) • 2edk' = dk'[l - e(2 - V2)} 



D 



3. LP DECODING 

In this section we show that if A is an adjacency matrix of an expander graph, then the LP 
decoding procedure can be used for recovering sparse approximations. 

Let $ be an m x n adjacency matrix of an unbalanced (2k, e)-expander G with left degree d. Let 
a(e) = (2e)/(l - 2e). We also define E(X : Y) = E n (X x Y) to be the set of edges between the 
sets X and Y. 
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3.1. LI Uncertainty Principle. In this section we show that any vector from the kernel of a 
an adjacency matrix <1? of an expander graph is "smooth", i.e., the i\ norm of the vector cannot 
be concentrated on a small subset of its coordinates. An analogous result for RIP-2 matrices and 
with respect to the £2 norm has been used before (e.g., in [KT07J ) to show guarantees for LP-based 
recovery procedures. 

Lemma 16. Consider any y £ W 1 such that <&y = 0, and let S be any set of k coordinates of y. 
Then we have 

\\ysh < «(e)l|y||i- 

Proof. Without loss of generality, we can assume that S consists of the largest (in magnitude) 
coefficients of y. We partition coordinates into sets So, Si, S2, ■ ■ ■ St, such that (i) the coordinates 
in the set Si are not larger (in magnitude) than the coordinates in the set S/_i, I > 1, and (ii) all 
sets but St have size k. Therefore, So = S. Let <&' be a submatrix of 3> containing rows from N(S). 
From the equivalence of expansion and RIP-1 property we know that ||^'ys||i = ||<&ys||i > 
d(l — 2e)||ys||i. At the same time, we know that ||<&'y||i = 0. Therefore 

= ||*'y||i > \\&y s \\i-^2 Yl l yi ' 

i>i (i,j)eE,ies h jeN(S) 

> d(l - 2e)[|y 5 ||i " 2 \E(Sl ■ N(S))\ min \y t \ 

Fi ieSl - 1 



> d(l - 2e)||y 5 ||i ~IJ2 \E(Si : N(S))\ • hs^h 



k 
l>\ 

From the expansion properties of G it follows that, for I > 1, we have |iV(S'U»S'z)| > d(l — e)\SuSi\. 
It follows that at most de2k edges can cross from Si to N(S), and therefore 

> d(l-2e)\\y s \\ 1 -^\E(Si:N(S))\-\\y Sl _ 1 \\i 

l>i 

> d(l-2e)|||/ s ||i-efc2^||i/ s ,_ 1 ||i/A! 

l>i 

> d(l-2e)||y5||i-2de||i/|| 1 

It follows that d(l - 26)112/511! < 2de||y||i, and thus \\y s \\i < (2e)/(l - 2e)||y||i. D 

3.2. LP recovery. The following theorem provides recovery guarantees for the program Pi, by 
setting u = x and v = x* . 

Theorem [3] Consider any two vectors u, v, such that for y = v — u we have $>y = 0, and \\v\\i < 
1 1 tx ||i- Let S be the set of k largest (in magnitude) coefficients of u, then 

\\v — u\\\ < 2/(1 — 2a(e)) • \\u — us\\i 



Proof. We have 



\\u\\i > \\v\\i = \\(u + y)s\\i + \\(u + y)s°\\i 

> \\ u s\\i ~ \\ys\\i + hsc\\i - \\usA\i 
= \\u\\i - 2||usc||i + \\y\\i - 2\\y s \\i 

> ||u||i-2||« 5 c|| 1 + (l-2a(e))||i/||i 
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where we used Lemma [16] in the last line. It follows that 

2||u<H| 1 >(l-2a(e))||y||i 



|«-«s||i < 2/(1 - 2a(e)) • \\u s 4i + 



d(l-2e)(l-2a) 



Proof. Analogous to the proof of Theorem [3l 

4. Experimental Results 



□ 



Theorem 17. Consider any two vectors u, v, such that for y = v — u we have ||3?y||i = P > 0, and 
\\ v \\i < IMIi- Let & be the set of k largest (in magnitude) coefficients of u. Then 

2/3 



□ 



1 

0.9 
0.8 
0.7 
0.6 
a- 0.5 
0.4 
0.3 
0.2 
0.1 




Probability of exact recovery, signed signals 



Probability of exact recovery, positive signals 




0.2 



0.4 



0.6 




FIGURE 4. Probability of correct signal recovery of a random fc-sparse signal 
x C { — 1,0,1}™ (left) and x € {0,1}™ (right) as a function of k = pm and m = Sn, for 
n = 200. The probabilities were estimated by partitioning the domain into 40 x 40 data 
points and performing 50 independent trials for each data point, using random sparse matri- 
ces with d = 8. The thick curve demarcates a phase transition in the ability of LP decoding 
to find the sparsest solution to Gx» = Gx for 67 a Gaussian random matrix (see |DT06j ). 
The empirical behavior for binary sparse matrices is consistent with the analytic behavior 
for Gaussian random matrices. 

Our theoretical analysis shows that, up to constant factors, our scheme achieves the best known 
bounds for sparse approximate recovery. In order to determine the exact values of those constant 
factors, we show, in Figure [H the empirical probability of correct recovery of a random A;-sparse 
signal of dimension n = 200 as a function of k = pm and m = 5n. As one can verify, the empirical 
O(-) constants involved are quite low. The thick curve shows the analytic computation of the phase 
transition between the survival of typical /-faces of the cross-polytope (left) and the polytope (right) 
under projection by G a Gaussian random matrix. This line is equivalent to a phase transition in the 
ability of LP decoding to find the sparsest solution to Gx* = Gx, and, in effect, is representative 
of the performance of Gaussian matrices in this framework (see |Don06b] and |DT06| for more 
details). Gaussian measurement matrices with m = 5n rows and n columns can recover signals 
with sparsity k = pm below the thick curve and cannot recover signals with sparsity k above 
the curve. This figure thus shows that the empirical behavior of binary sparse matrices with LP 
decoding is consistent with the analytic performance of Gaussian random matrices. Furthermore, 
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the empirical values of the asymptotic constants seem to agree. See |BI08| for further experimental 
data. 



5. SUBLINEAR-TIME DECODING OF SPARSE VECTORS 

In this section we focus on the recovery problem for the case where the signal vector x is /c-sparse. 
The algorithm given here is essentially subsumed by the HHS algorithm provided in the Appendix 
(modulo the slightly higher reconstruction time and the number of measurements of the latter). 
The advantage of the algorithm from this section, however, is that it is very simple to present as 
well as analyze. This will enable us to illustrate the basic concepts without getting into technical 
details needed for the case of general signals x. 

The algorithm we present here is essentially a simplification of the algorithm given in |Ind08| . 
The main difference is the assumption about the measurement matrix <&. In [Ind08| the matrix 
3> was assumed to be an "augmented" version of the adjacency matrix of an extractor. Here, 
we assume that is an "augmented" version of the adjacency matrix of an unbalanced expander 
(or alternatively, by Theorem [21 any binary matrix that satisfies an RIP-1 property). The latter 
assumption turns out to be the "right" one, resulting in further simplification of the algorithm. 

To define the matrices, we start with a straightforward definition. If q and r are 0-1 vectors, we 
can view them as masks that determine which entries of a signal appear and which are zeroed out. 
For example, the signal qo x consists of the entries of x restricted to the nonzero components of 
q. The notation o indicates the componentwise or Hadamard product of two vectors. Given 0-1 
matrices Q and R, we can form a matrix that encodes sequential restrictions by all pairs of their 
rows. We express this matrix as the row tensor product Q (g> r R. 

The construction of the measurement matrix is as follows. Let G = (A,B,E) be a (k,e)- 
unbalanced expander, e = 1/8, with left degree d, \A\ = n, \B\ = m. Without loss of generality we 
can assume that m is a power of 2. Let \1/ be the adjacency matrix of G. The measurement matrix 
$ = \1/ ® r B has m log n rows. The matrix B is the bit-test matrix] its tth column contains the 
binary expansion of t. That is, For t = . . . log n — 1 and j = . . . m — 1, let I = t + j log n. The 
(/ + l)-th row <&z + i of <& is such that, for any i = . . . m — 1, (<&; + i)j + i = (*&)j + i • hm t (i), where 
him. (i) is the t-th least significant bit in the binary representation of i. 

The purpose of the above augmentation is as follows. Consider a specific /c-sparse vector x. 
Let supp(x) denote the the support of x, i.e., the set of indices of the non-zero coordinates of x. 
Assume that the (j + l)-th row r of the matrix \1/ is such that | supp(x) PI supp(r)| = 1, and let 
i + 1 G supp(x) n supp(r). Then, from the values of <&ji og n+i£> • • • > ^jiogn+iognX, we can recover 
both i and x»+i. This is because for each of those rows, the value of $>jiogn+t+ix is equal to 
if bhit(i) = 0, or a?j+i otherwise; moreover, at least one of the values is non-zero. Of course, if 
| supp(x) n supp(r)| 7^ 1, then the algorithm might "recover" an incorrect pair (i, val). This can be 
detected if val = 0; otherwise, the error might be undetected. 

Theorem 18. Let & be a matrix described above. Then for any k-sparse x, given 3>x, we can 
recover x in time 0(mlog n). 

Proof. The main component of the recovery algorithm is the procedure Reduce($>x), which returns 
a vector y such that \\x — y||o < ||x||o/2. The procedure maintains a vector i>otes[-] of multisets. 
Initially, all entries are set to 0. 
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Reduce ($x) : 






for j = 1 . . . m 






compute (I, val) such that if supp($)j n supp(x) = 


{i} then Xi 


= val 


if val 7^ then votes[l] = votes[l] U {val} 






y = 






for i = 1 ... 7i such that votes[i] ^ 






if votes[i] contains > d/2 copies of val then yi 


= val 




return y 







Claim 19. For any k-sparse vector x, the procedure Reduce returns a vector y such that x — y is 
k/2-sparse. 

Proof. From the expanding properties of G, it follows that at most ed||x||o rows of \I/ return an 
incorrect pair (i, val). Each set of d/2 incorrect pairs can change the outcome for at most 2 positions 
of y. Thus, at most 4e||x||o = ||x||o/2 positions of y can be incorrect. □ 

The final algorithm invokes Recover($x). 



Recover (<J?x) 

if <&£ = then return 

y = Reduce ($x) 

{ We now need to recover x — y} 

z =Recover($(x — y)) 

return y + z 



□ 



6. Conclusion 

We show in this paper that the geometric and the combinatorial approaches to sparse signal 
recovery are different manifestations of a common underyling phenomenon. Thus, we are able 
to show a unified perspective on both approaches — the key unifiying elements are the adjacency 
matrices of unbalanced expanders. 

In most of the recent applications of compressed sensing, a physical device instantiates the 
measurement of x and, as such, these applications need measurement matrices which are conducive 
to physical measurement processes. This paper shows that there is another, quite different, large, 
natural class of measurement matrices, combined with the same (or similar) recovery algorithms 
for sparse signal approximation. These measurement matrices may or may not be conducive to 
physical measurement processes but they are quite amenable to computational or digital signal 
measurement. Our work suggests a number of applications in digital or computational "sensing" 
such as efficient numerical linear algebra and network coding. 

The preliminary experimental analysis exhibits interesting high-dimensional geometric phenom- 
ena as well. Our results suggest that the projection of polytopes under Gaussian random matrices 
is similar to that of projection by sparse random matrices, despite the fact that Gaussian random 
matrices are quite different from sparse ones. 

Acknowledgments: The authors would like to thank: Venkat Guruswami and Salil Vadhan, 
for their help on the expanders front; David Donoho and Jared Tanner for providing the data for 
the analytic Gaussian treshold curve in Figure HJ Justin Romberg for his help and clarifications 
regarding the ^i-MAGIC package; and Tasos Sidiropoulos for many helpful comments. 
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Appendix A. HHS(p) decoding 

In this section we focus on the general recovery problem for an arbitrary vector x. We show 
that, as in the previous algorithm in Section [5J we can use the adjacency matrices of unbalanced 
expanders, suitably augmented and concatenated, to obtain an explicit construction of matrices 
that are designed for the sub- linear time combinatorial recovery algorithm HHS in [GSTV07] . 
We call this modified algorithm the HHS(p) algorithm. For the sake of exposition, we highlight 
the necessary changes in the construction of the matrices and the algorithm and summarize the 
remaining, unchanged portions of the algorithm. Similarly, for the analysis of HHS(p), we present 
only those portions which are significantly different from the original analysis and summarize those 
which are not. We establish the following result. 

Theorem [4] Let x G R n and fix a sparsity parameter k and a number e £ (0,1). There is a 
measurement matrix \1/, which we can construct explicitly or randomly, with the following property. 
The HHS(p) algorithm, given measurements v = \I/x of x, returns an approximation x of x with 
0{k/e) nonzero entries. The approximation satisfies 

\\x — x\\ p < ek 1 '? -1 ^ — Xk\\i- 

Let R denote the size of the measurements for either an explicit or random construction. Then, 
the HHS(p) algorithm runs in time poly(R). 

For explicit constructions R = 0(/c2 loglogn ) and for random constructions R = 0(k polylog n). 

Corollary 20. If we truncate the output x of the algorithm to the k largest terms, then the modified 
output xt satisfies 

\\ x — %k\\p < \\ x — x k\\p + 2e& ' p ~ \\x — Xfc||i and 
\\x - x fc ||i < (1 + 3e)||x - x fc ||i. 

A.l. The measurement matrices. This subsection describes how to construct the measurement 
matrix \l/ '. We note that this construction has the same super-structure as the random construction 
in [GSTV07] and, as such, we highlight the necessary changes while summarizing the similar pieces. 
For ease of exposition, we set e = 1. The matrix ^ consists of two pieces: an identification matrix 
fi and an estimation matrix 3>. We use the first part of the sketch to identify indices of significant 
components of the signal quickly and then we use the second part to estimate the coefficients of 
those terms. 

We use concatenated copies of the (normalized) adjacency matrices T of (fc'je^-unbalanced 
expanders with left degree d. In what follows, we let the sparsity parameter k' vary but fix 
p = 1 + l/log(n). Normalize by the factor d~ l ' p . By Theorem [U such matrices satisfy the 
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RIP p ,fc',<5 property for p = 1 + 1/ logn and 6 < Ce'. Fix e' so that 4e' + l/(2d l / p ) < 1/2. Finally, we 
note that the number of rows in such a matrix is k'2" oglogn > for known explicit constructions of 
such matrices and is k! polylog(ra) for random constructions. To simplify our accounting below, we 
denote the number of rows by R for either the explicit or the random constructions. 

A. 1.1. Identification matrix. The identification matrix ft is a 0-1 matrix with dimensions 
O (i? poly log (re)) x n. It consists of a combination of a three structured matrices. Formally, fi 
is the row tensor product ft = B ® r A. The bit-test matrix B has dimensions O(logra) x n, and 
the isolation matrix A has dimensions O(-Rpolylogn) x n. 

The isolation matrix A is a 0-1 matrix with a hierarchical structure. It consists of log (A;) blocks 

A^> labeled by j = 1, 2, 4, 8, ... , J, where J = k. Each block, in turn, has further substructure as 

()) (i) (i) 

a row tensor product of a collection of 0-1 matrices: Af y s = Rf <S> r S s for s = 1, 2, 4, . . . , k and 

(i) (i) 

r = 2s, 4s, 8s, . . . , n. The first matrix SI is called the sifting matrix and the second matrix Rf 

(i) 
is called the noise reduction matrix. Each sifting matrix Ss is the normalized adjacency matrix 

of an (s, e')-unbalanced expander (or, equivalently, a normalized 0-1 RIP PiSi< 5 matrix). 

(i) 
The purpose of the noise reduction matrix Rf' is to attenuate the noise in a signal that has 

a single large component. It is also a 0-1 valued matrix constructed as follows. For each pair of 

indices (r, s), let (3 ~ r/s be a prime power corresponding to a finite field of size (3. Then we set each 

noise reduction matrix to be the Nisan-Wigderson generator over the field of size (5. That is, we 

index the rows of the matrix by by pairs (a, b) of field elements and index columns by polynomials 

q of degree at most polylog(n). Position ((a,b),q) is 1, if q(a) = b, and 0, otherwise. Such a 

construction produces a matrix with 0(/3 2 ) rows. See |DeV07] or [CM06J for similar constructions 

of similar sizes. 

(i) (i) (i) 

We note that the dimensions of the product of matrices ArJ = Rf ® r S s are the critical ones 

(not the dimensions of the individual matrices). Each block is of dimension 0(i?polylog(n)) x n. 

A. 1.2. The estimation matrix. The estimation matrix $ is an RIPp^^ matrix (or, equivalently, the 
adjacency matrix of an {K, e)-unbalanced expander). The parameter K = 0(k polylog(n)) specifies 
how many spikes are identified during the identification stage. To ensure that the columns of $ 
have unit £ p norm, we scale the matrix appropriately. 

A. 2. The HHS(p) algorithm. The structure of the HHS(p) algorithm remains unchanged. We 
include pseudo-code in Figure [5] for completeness. This algorithm is an iterative procedure with 
several steps per iteration, including identifying a small set of signal positions that contain a 
substantial fraction of the £ p weight, then estimating their values, and finally subtracting their 
encoded contribution from the initial measurements. 



A. 3. Proof sketches. The goal of the algorithm is to identify a small set of signal components 
that carry most of the p-energy in the signal and to estimate the coefficients of those components 
well. We argue that, when our signal estimate is poor, the algorithm makes substantial progress 
toward this goal. We focus on the analysis of the algorithm in the case when our signal estimate is 
poor as this is the critical case. More precisely, assume that the current approximation a satisfies 



> 



k^-'Wx-XkW,. (A.l) 



Then one iteration produces a new approximation a llcw for which \\x — a ncw |L < \ \\x — a\\ . In 
this fashion, the algorithm improves the ^,-norm of the error geometrically until it is small enough. 
Contrast this estimate with that of the previous section (where we reduced the £q norm of the error 
at each step). We will show this major result in a series of steps: 
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Algorithm: HHS(p) Pursuit 

Inputs: The number k of spikes, the HHS measurement matrix *& , 

the initial sketch v = Vl/x, and A the dynamic range of x 
Output: A list L of 0(/c) spikes 

For each iteration i = 0, 1, . . . , 0(log A) { 
For each scale j = 1,2, 4, . . . , O(fe) { 
Initialize L' = 0. 
For each row of A^> { 

Use the O(logn) bit tests to identify one spike location 

} 

Retain a list L'- of the spike locations 

that appear £l(dj) times each 

Update L' < — 1/ U L'j 

} 

Estimate coefficients for the indices in L' by forming <&^,s C st 

with Jacobi iteration 
Update L by adding the spikes in L' 

If an index is duplicated, add the two values together 
Prune L to retain the 0(A:) largest spikes 
Encode these spikes with measurement matrix \I/ 
Subtract encoded spikes from original sketch v to form 

a new residual sketch s 
} 



Figure 5. Pseudocode for the HHS(p) Pursuit algorithm 

(1) First, we obtain a fundamental relationship between the large and the small entries in a 
signal in Lemmas [2lT 1221 

(2) Next, we show in Lemma [23] that the sifting matrix isolates significant coefficients in a 
moderate amount of noise. 

(3) Lemma [271 proves that the noise reduction matrix reduces the £i-norm of the noise enough 
so that we can accurately identify the indices of the significant coefficients. 

(4) Finally, we use a small RIPp^s matrix to estimate the values of the coefficients in Lemma [28l 

We describe some generic properties of signals that are important in the analysis. Given a signal 
g, we write gt to denote the signal obtained by zeroing out all the components of g except the t 
largest in magnitude (break ties lexicographically). We refer to gt as the head of the signal; it is 
the best approximation of the signal using at most t terms with respect to any monotonic norm 
(such as lp). The vector g — gt is called the tail of the signal. First, we show that the £ p norm of 
the tail of a signal is much smaller than the t\ norm of the entire signal. 

Lemma 21. For any signal g, it holds that 

Proof. Assume that the terms g{i) of g are arranged in decreasing order |g(l)| > |<?(2)| > ■■■ > 
|<7(ra)|. We observe that each term gii) in g satisfies \g(i)\ < i _1 ||r7||i and we can bound the £ p norm 
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of the tail as 

\\g- 9t\\ P 




□ 
( i \ x i p 

We note that for p = 1 + 1, the constant C p = I — — j- I is bounded below by 1/t and from above 

by \/l 2 so if p = 1 + l/log(n), then log(n) < C p < log 2 (n). 

Let r = x — a denote the residual signal at the current approximation a and let £ = 0(k) be the 
number of terms in the approximation a at each iteration. When condition (jA.ip holds, most of 
the p-energy in the signal is concentrated in its largest components. The next lemma relates the £ p 
norm of the residual to the £ p norm of its tail r — m and to the £\ norm of the entire residual. 

Lemma 22 (Head and Tail). Suppose that (lA.lj) holds. Let £ = 0(k) be the number of terms in 
the approximation a at each iteration. Then the following bounds hold. 

\\r\\ p >C p \\r-r e \\ p (A.2) 

lk||p > Ct/P-^rWx. (A.3) 

Proof. Observe that because the approximation a contains no more than £ terms in each iteration, 
the number of terms in the head of the residual r^ = a — Xf- is not more than £ + k = 0(k). Rather 
than keep careful track of the exact number of terms in each approximation and each residual, we 
will, by abuse of notation, suppress all constants and refer to the number of terms as 0(k) = £. 
Therefore, (lA.lh implies that 

Ikllp = Ik - a\\ P > k^^Wx - x fc ||i = k 1/p - l \\(x -a) + {a- x fc )||i = £ x l v ~ x \\r - r t \\i. (A.4) 

Now, apply Lemma [21] to the residual tail r — r^ with t = £ to obtain 

llr-r/lli^Cp^-^Hr-r/llp. 

If we combine this relation with our previous calculation, we find 

\\r\\ p > C p \\r-rt\\ p 

which is our first desired inequality. 

To prove the second inequality, we use the Cauchy-Schwarz inequality to see that 



\r\ 



> HrJI > ^/v^WrnW-x 
d W 1 t\\v — t- ' « 1" 



p c- \\i y.\\p 



Finally, we add this inequality to (|A.4p to obtain 

lp 



r||„> Cf/P-^MU. 



□ 



Now, we turn to the identification of significant signal entries. We pull these off in bands of 

decreasing magnitude. For exposition, we assume that ||p||i = 1. We can think of the action of one 

(7) 
submatrix S^ as 

S«) :g^[h} h 2 ... h N ], 

mapping each input signal to a collection of output signals. 
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Lemma 23 (Isolation). Let g be a signal and let I be a subset of {1,2, ... ,n} with s < \I\ < 2s. 

Let £=\I\. We apply the sifting operator S^ to g. For at least (1 - p')£ = (1 - (4e' + l/(2d 1/p )) 
of the indices i & I, there is an output signal h of the form 

2.. 
h = giei + v and \\v\\x < -j\\g\\i. 

Proof. To prove this result, we must show a sequence of shorter results. Our goal is to isolate 
significant spikes from one another and to reduce the contribution of the rest of the signal to these 
isolations. 

Definition 24. An (s,r) band is a band of spikes of magnitude between 1/r and l/(2r). There 
is some s (a power of 2) such that the number of spikes in the band is between s and 2s. If the 
p- contribution, sr~ p , of this band to the current residual error \\x — a\\p is greater than the average 

( Y 

band's contribution, j-^l k l ' p 1 \\x — a\\ p J ; i.e., 

sr- p > — M k^-Hx - a\\ p ) 
log n \ I 

then we call this band significant. 

The next lemma tells us how many spikes lie above a significant band. 

Lemma 25. // an (s, r) band is significant, then there are at most s polylog(n) terms of magnitude 
greater than 1/r in the current residual. 

Proof. If the s spikes of magnitude 1/r have p-energy at least as big as s' spikes of magnitude 
1/r' > 1/r, then 

s(l/r)P > s'(l/r') p > s'{l/r) p 

and so the number of larger spikes must be less than s, s' < s, as p > 1. Because there are only 
logn possible s', if we take a union over all s' , we achieve the desired result. □ 

Definition 26. We say that a spike with index i is isolated from other spikes with indices in a set 
I by a matrix T if there is a row in T that has a one in column i and no other one in any column 
in I . 

We are ready to begin the proof of Lemma [231 Our goal is to show that T = Ss isolates from 
each other the vast majority of spikes of magnitude > 1/r, and, hence, isolates the majority of 
those with magnitude equal to 1/r from the larger spikes. 

By the expansion properties of T, at least (1 — 4e')£ distinct spikes in the original signal are 
isolated d/2 times in the measurements. Next, we show that T not only isolates spikes but also 
reduces the t\ norm of any noise, if it is present. We will argue that few of the good output signals 
have a lot of noise. Let u m denote the magnitude of the mth output position. We observe that the 
1 — ► 1 operator norm of the matrix T is d 1_1 ' p . By Markov's inequality, 



*{ 



in 



2„ , 1 I \r^ 

> — \\Q l r < ,, ,, > u m 

~ £ nyni j ~ 2\\g\\x z ^ 



2Nli 
d i-i/p 



9\\i 
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Therefore, there are at least (1 — p)dl/2 isolating rows and no more than -^dr l ' v of the rows are 
corrupted by noise with large l\ norm. This leaves d£/2((l — 4e') — ,\/ p ) = (1 — p')d£/2 good 
rows with a single distinct spike and a small amount of noise. With judicious choice of unbalanced 
expander parameters, we have (1 — p')£ isolated spikes corrupted by a small amount of noise. □ 

The isolation lemma [23] produces signals which consist of single isolated spikes plus some noise. 
In order to apply bit-testing accurately, we must reduce the t\ norm of the noise from a factor 1/s 
of the original t\ norm by a further factor s/r (i.e., to a factor 1/r of the original noise). Assume, 
for exposition, that the original £\ norm is one. To achieve this reduction, we use a noise reduction 
matrix R¥. 



Lemma 27 (Noise reduction). Let h = giCi + v with \\v\\i < 1/s. We apply the noise reduction 

st half of the output signals, we 

5 + u with IMIi < C/r. 



(i) 
operator R = R) to h. In at least half of the output signals, we have signals of the form 



Proof. Consider the submatrix R' of R obtained by extracting the (5 rows of R that have a 1 in 
column i and then removing column i itself. By construction, R' has at most polylog(n) ones in 
any column, so its 1-to-l operator norm is at most polylog(n). Thus the average over rows of R' 
of H-RVIIj is at most P ° - ' gf ~ l/ r P°lyl°g(ra), or a t most l/2r if we incorporate extra log factors 
into (3. 

In addition, the number of rows in the matrix ArJ is s • max(l, (r/s) 2 ), where s < k and 
sr -p > yp— i. if s j r > i ; the number of rows is s < k. If s/r < 1, the number of rows is r 2 /s. 
Observe that sr~ p > k 1 ~ p and r/s > 1 imply 1/r > l/k. Since sr~ p > k l ~ p ', it follows that 

L = -!— < * < k 2 ~ p k p ' x = k. 

D 

The estimation portion of the algorithm is exactly the same as the HHS algorithm. Let L be the 
set of identified candidates, \L\ < K. By the RIP-p property, the submatrix of $ given by columns 
indexed by L is invertible and the inverse $£ is bounded in the appropriate operator norm, we 
can apply <I>^ to &x by Jacobi iteration in time 0(K 2 polylog(n)) to estimate the coefficients in L. 
In the exact case of x-£ = 0, we recover x exactly. Otherwise, the RIP-p property assures that the 
p-norm of the vector of estimates is bounded in terms of the p-norm of x — xk] i-c, this estimation 
process introduces errors that are small compared with the error associated with missing the n — K 
other terms in x. Its proof is a small modification of the proof in [GSTV07] . which itself was 
originally proven by Rudelson. 

Lemma 28. Suppose that $ is an m x n RIP(p, K, 5) matrix. Then the following holds for every 
vector x G M n . 

||*x|| p < {l + 5)(\\x\\ p + K l l p - l \\x\\ 1 

For completeness, we conclude this section by showing that a mixed norm error bound of the 
form £ p < C /k 1 ~ 1 ' p £\ gives us an £\ error bound t\ < C£\. 

Theorem 29. Suppose 

\\% — cc || < ek 1 ' p ~ 1 \\xf- — icHj . 

Then \\xk — x\\ < \\xk — x\\ + 2ek 1 ' p ^ 1 \\xk — x\\ x and \\xk — x\\± < (1 + 3e) \\xk — ^111- 
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Proof. Let H denote supp(xfc) Usupp(xfc), so that \H\ < 2k. Let H c denote the complement of H. 

\\x k -x\\ p < IPfc ~ x )\h\\ p + \\{xk ~x)\hA\ p 

< \\(x k - x)\h\\ p + \\(x- x)\ H \\ p + \\(x k - x)\ H c\\ p 

< \\(x k - x)\ H \\ p + IP - x)\ H \\ p + \\(x k - x)\h4 p 

< \\ x k -x\\ p + 2\\(x- x)\ H \\ p 

< \\x k - x\\ p + 2\\x- x\\ p 



Similarly, 



< \\ x k - x\\ +2ek l / p 1 ||x fe -a;|| 1 



l^fc-»lll = \\{Xk-x)\H\\i + \\(xk-x)\ H A\l 

< \\(x k -x)\ H \\i + ||(£-3;)|h||i + \\(xk - x)\h°\\i 

< \\(xk -x)\h\\i + ||(£-3;)|h||i + ||(xjfc ~x)\h4i 

< \\xk-x\\ 1 + 2\\(x-x)\ H \\i 

< \\ Xk - x \\ 1 + 2(2k) l - 1 ^\\(x-x)\ H \\ p 

< \\x k - x^ + 2(2k) 1 ~ 1/p \\x - x\\ p 

< (1 + 3e) \\xk — x|| x . 



□ 



